BDSI 2021; University of Michigan
Reports
Slides
Manuscripts / books
R code and interpretations integrated into a single document
Separate tasks of reporting the results from formatting the results:
decreases risk of copy-paste errors
decreases workload
Quickly create the same document in different formats, e.g. slides to show and handouts for the audience
Create websites
source: rstudio.com
whatever format you want to create: html, pdf, docx, …
pandoc: “an open-source document converter” (wikipedia). Translates markup from one type of format, e.g. markdown, to another
md: a document written in markdown, “a lightweight markup language with plain text formatting syntax” (wikipedia). Github also uses markdown.
knitr: an R package for creating reports directly in R. Will translate your R markdown document (.Rmd), including embedded R code, to a plain markdown document
.Rmd: file type recognized by Rstudio. This is where everything goes: your header, R code chunks, and your content written in markdown
From RStudio, go to File > New File > R Markdown...
R (https://cran.r-project.org/)RStudio to interface with R (https://www.rstudio.com/)01-exercise.Rmd01-exercise.Rmd08:00
If you name a variable in an earlier code chunk, you can use it again in a later chunk.
x <- rnorm(20); y <- 3 * x + rnorm(length(x)); foo = tibble(x = x, y = y);
library(ggplot2) ggplot(data = foo) + geom_point(aes(x, y));
foo;
## # A tibble: 20 x 2 ## x y ## <dbl> <dbl> ## 1 0.390 0.121 ## 2 -0.965 -2.83 ## 3 1.04 3.63 ## 4 0.125 -0.399 ## 5 0.170 0.965 ## 6 1.56 3.62 ## 7 -0.825 -2.56 ## 8 -1.25 -2.87 ## 9 0.555 2.53 ## 10 -0.112 -0.698 ## 11 -0.429 -1.56 ## 12 0.0366 0.804 ## 13 1.17 3.20 ## 14 -0.506 -2.86 ## 15 0.314 -0.244 ## 16 2.18 6.35 ## 17 -0.599 -0.231 ## 18 -1.96 -5.88 ## 19 0.292 0.546 ## 20 -0.873 -2.07
| x | y |
|---|---|
| 0.38978 | 0.12069 |
| -0.96466 | -2.83252 |
| 1.04329 | 3.62949 |
| 0.12483 | -0.39871 |
| 0.16966 | 0.96463 |
| 1.56343 | 3.62120 |
| -0.82484 | -2.55974 |
| -1.25424 | -2.86593 |
| 0.55485 | 2.52962 |
| -0.11196 | -0.69774 |
| -0.42864 | -1.55763 |
| 0.03657 | 0.80367 |
| 1.17391 | 3.19856 |
| -0.50556 | -2.85799 |
| 0.31370 | -0.24402 |
| 2.17790 | 6.35158 |
| -0.59893 | -0.23119 |
| -1.96191 | -5.88256 |
| 0.29225 | 0.54614 |
| -0.87258 | -2.06756 |
Use #, ##, ###, etc to indicate deeper layers of a header
Use *, + for bulleted (unordered) lists
Use (i), (a), or 1. for ordered lists
Use *{text}* for italics, **{text}** for bold
If something (a new header option, a code chunk, etc) is not working as you expect, try adding an additional linebreak
If experimenting with a new feature, re-knit frequently
If, like me, you become a compulsive re-knitter, the code chunk option cache = TRUE is both useful and dangerous.
```{r, cache = TRUE}
(some intensive task)
```
As long as you don’t change anything in the chunk, you won’t need to re-run the intensive task upon re-knitting. However, things can go awry…
Open the file caching_mishap.Rmd and make sure you understand the intended behavior (should be trivial!)
Knit the document
Now edit your first chunk, changing to x <- rnorm(n = 1, mean = 100) and leaving the second chunk alone
Re-knit your document
That’s how we get results like this:
x <- rnorm(n = 1, mean = 100);
x;
## [1] 2.7449
We triggered a recache of the first chunk without triggering a reache of the second
Cache with caution and only cache costly chunks
Think about when and where you want to split your chunk
For chunks that may be susceptible, trigger a re-cache by adding a comment character (#) at the end of a line, or making some other innocuous change to your chunk. Even extra white space will trigger a re-cache
Go to Knit > Clear Knitr Cache… or delete directly the folder ending in [filename]_cache in your working directory
knitr can run code in other languagesIncluding
Python
SQL
Julia
Stan
Javascript
Use ```{python} to start a python code chunk, ```{julia} to start a julia code chunk, ```{bash} to start a Shell script, etc.
You may need external language engines to successfully call other languages. I have not used this functionality before.
R scripts!You are not limited to using Markdown in Rmd files – you can knit R scripts using the same shortcut: Cmd+Shift+K / Ctrl+Shift+K
Use #' to indicate a switch to markdown
Use #+ to start a new chunk
Open 02-exercise.R and complete the tasks. Indicate when you are done.
08:00
Rreadr packagePart of the tidyverse (along with dplyr and ggplot2):
readr gives you tools to read in data from files outside R, wrangled and manipulated, and then written to files outside R:
The workhorse of the readr package is read_csv, which reads a comma-separated value (csv) file into R as a data.frame From the help page:
read_csv(file, col_names = TRUE, col_types = NULL, locale = default_locale(),
na = c("", "NA"), quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE,
skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress(),
skip_empty_rows = TRUE)
Typical use is my_data <- read_csv("my_files_path.csv")
tumor_growth.csvVarna M, Bertheau P, Legres LG. Tumor Microenvironment in Human Tumor Xenografted Mouse Models. Journal of Analytical Oncology 2014; 3(3): 159-166.
(tumor_growth <- read_csv("tumor_growth.csv"))
## # A tibble: 574 x 5 ## Grp Group ID Day Size ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 1.CTR 1 101 0 41.8 ## 2 1.CTR 1 101 3 85 ## 3 1.CTR 1 101 4 114 ## 4 1.CTR 1 101 5 162. ## 5 1.CTR 1 101 6 178. ## 6 1.CTR 1 101 7 325 ## 7 1.CTR 1 101 10 624. ## 8 1.CTR 1 101 11 648. ## 9 1.CTR 1 101 12 836. ## 10 1.CTR 1 101 13 1030. ## # … with 564 more rows
dplyr knowledgetumor_growth %>% filter(Day %in% c(0, 14)) %>% group_by(Grp, Day) %>% summarize(mean_Size = mean(Size))
dplyr knowledgetumor_growth %>%
filter(Day %in% c(0, 7, 14)) %>%
group_by(Grp, Day) %>%
summarize(mean_Size = mean(Size),
sd_Size = sd(Size))
dplyr knowledgetumor_growth %>% filter(Grp == "1.CTR") %>% group_by(ID) %>% summarize(n = n()) %>% summarize(n = mean(n)) %>% pull(n) # pull
R Markdown: The definitive guide
Rstudio experts